Separate or joint? Estimation of multiple labels from crowdsourced annotations
نویسندگان
چکیده
Artificial intelligence techniques aimed at more naturally simulating human comprehension fit the paradigm of multi-label classification. Generally, an enormous amount of high-quality multi-label data is needed to form a multi-label classifier. The creation of such datasets is usually expensive and timeconsuming. A lower cost way to obtain multi-label datasets for use with such comprehension–simulation techniques is to use noisy crowdsourced annotations. We propose incorporating label dependency into the label-generation process to estimate the multiple true labels for each instance given crowdsourced multi-label annotations. Three statistical quality control models based on the work of Dawid and Skene are proposed. The label-dependent DS (D-DS) model simply incorporates dependency relationships among all labels. The label pairwise DS (P-DS) model groups labels into pairs to prevent interference from uncorrelated labels. The Bayesian network label-dependent DS (ND-DS) model compactly represents label dependency using conditional independence properties to overcome the data sparsity problem. Results of two experiments, ‘‘affect annotation for lines in story’’ and ‘‘intention annotation for tweets’’, show that (1) the ND-DS model most effectively handles the multi-label estimation problem with annotations provided by only about five workers per instance and that (2) the P-DS model is best if there are pairwise comparison relationships among the labels. To sum up, flexibly using label dependency to obtain multi-label datasets is a promising way to reduce the cost of data collection for future applications with minimal degradation in the quality of the results. 2014 Elsevier Ltd. All rights reserved.
منابع مشابه
Finding Patterns in Noisy Crowds: Regression-based Annotation Aggregation for Crowdsourced Data
Crowdsourcing offers a convenient means of obtaining labeled data quickly and inexpensively. However, crowdsourced labels are often noisier than expert-annotated data, making it difficult to aggregate them meaningfully. We present an aggregation approach that learns a regression model from crowdsourced annotations to predict aggregated labels for instances that have no expert adjudications. The...
متن کاملSoft Biometric Recognition from Comparative Crowdsourced Annotations
Soft biometrics provide cues that enable human identification from low quality video surveillance footage. This paper discusses a new crowdsourced dataset, collecting comparative soft biometric annotations from a rich set of human annotators. We now include gender as a comparative trait, and find comparative labels are more objective and obtain more accurate measurements than previous categoric...
متن کاملAnnotation models for crowdsourced ordinal data
In supervised learning when acquiring good quality labels is hard, practitioners resort to getting the data labeled by multiple noisy annotators. Various methods have been proposed to estimate the consensus labels for binary and categorical labels. A commonly used paradigm to annotate instances when the labels are inherently subjective is to use ordinal scales. In this paper we propose annotato...
متن کاملLearning to Predict from Crowdsourced Data
Crowdsourcing services like Amazon’s Mechanical Turk have facilitated and greatly expedited the manual labeling process from a large number of human workers. However, spammers are often unavoidable and the crowdsourced labels can be very noisy. In this paper, we explicitly account for four sources for a noisy crowdsourced label: worker’s dedication to the task, his/her expertise, his/her defaul...
متن کاملDetection of Musical Event Drop from Crowdsourced Annotations Using a Noisy Channel Model
This paper describes the algorithm for our submission to the MediaEval 2014 crowdsourcing challenge. We perform a Maximum Likelihood (ML) estimation of the true label, using only the multiple noisy labels. Each annotator’s decision is modeled by a die-toss based on which the annotator changes the true label. We learn parameters of this noisy channel model using the Expectation-Maximization algo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Expert Syst. Appl.
دوره 41 شماره
صفحات -
تاریخ انتشار 2014